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We study a space-time finite element approach for the nonhomogeneous wave equation using a 
continuous time Galerkin method. We present fully implicit examples in 1-1-1, 2 + 1, and 3 + 1 
dimensions using linear quadrilateral, hexahedral, and tesseractic elements. Krylov solvers with 
additive Schwarz preconditioning are used for solving the linear system. We introduce a time 
decomposition strategy in preconditioning which significantly improves performance when compared 
with unpreconditioned cases. 



I. INTRODUCTION 



Space-time finite elements provide some natural advantages for numerical relativity. With space-time elements, 
time-varying computational domains are straightforward, higher-order approaches are easily formulated, and both 
time and spatial domains can be discretized using a single unstructured mesh. However, while continuous Galerkin 
approaches employing space-time finite elements have found use in many engineering applications Q, 0, 0, B, H, 
they have not been used in numerical relativity. Recent numerical relativity evolutions using finite elements employ 
discretization of the space domain and marching in time rather than simultaneous discretization of both space and 
time domains 0, 0, 

We investigate a space-time finite element method similar to ^ using continuous approximation functions in both 
space and time to explore its use for numerical relativity simulations. The main purpose of this paper is to present our 
numerical results. We present a time-parallel preconditioning strategy for use with continuous space- time elements 
and Krylov solvers, and explore numerical results in 1 + 1 dimensions and higher. 

Many space-time approaches to the wave equation exist (see Our approach is different in 

that we do not use time slab finite elements, which are continuous in a limited domain of time (the time slab) 
but discontinuous between neighboring time slabs. Instead, we discretize space and time together for the entire 
domain using a finite element space which does not discriminate between space and time basis functions and consider 
iterative solution methods with a time decomposition preconditioner. This approach has advantages for more general 
finite element spaces and parallelization. In this paper, however, we restrict ourselves to structured space-time finite 
elements and present the results obtained on a single processor in order to better compare results and performance 
with other approaches to solving the wave equation. 

We consider the following nonhomogeneous wave equation problem with the initial and boundary value problem to 
find ii(x, t) such that 

— -Au = f in l]x[0,T], (1) 

u = on Q X {t = 0}, 
ut = on Q X {t = 0}, 
Un = on dnx[0,T], 

where ^7 is a bounded domain in i?^, d = 1, 2, 3 and Un is the outward pointing normal derivative. As in Eq. (^) 
is re- written to be first order in time by introducing an auxiliary variable, v = Ut'. 
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We use a nonhomogeneous Dirichlet boundary condition on the initial boundary Q x {t = 0}, and a homogeneous 
Neumann boundary condition for dCt x (0, T]. No boundary condition is set at x {t = T} to avoid overspecifying the 
problem. Consequently, the evolution equations themselves become an effective boundary condition by determining 
the values for the solution at x {t = T}. 

The space L'^{Q) is defined as the closure of C^(r^) in the norm, 

IMmn) = (^J^ \u\^dx^ < oo. 
The i^^-seminorm and norm of u ^ H^^Q) are, respectively, 

J r2 

We define the Hilbert space L^{[0,T], H^{n)) by 

1/2 



For the space-time finite element space of n = 1,2,3 spatial dimensions, we consider the standard finite element 
space of n + 1 dimensions. Therefore our finite element space V is the space of piecewise polynomial functions 
(j) : n X {0,T] ^ R. 

The weak form is to find approximate solutions u^v e L^([0, T], such that 

M{u,v,(l)) = (3) 

N{u,v,^) = y^eL\[0,T],H\Q)), (4) 

where 

M(^,i;,(/)) = / f^^^^u.^^-fAds, (5) 

N{u,v,^) = [ f-^d^^vAds. (6) 
Jnx[o,T] \ ot J 

Motivated by the success of domain decomposition methods for general sparse matrices |T^, we also 

examine additive Schwarz methods 17], 18], [2^, [2^ with a time decomposition preconditioning strategy. 

While additive Schwarz preconditioning has been applied to hyperbolic problems before applying additive 

Schwarz preconditioning to space-time elements using a time decomposition strategy is unique to this work. 



II. NUMERICAL RESULTS 



In this section we present solutions to the nonhomogeneous wave equation using space-time elements in various 
dimensions. We use uniform structured meshes to better compare results with other approaches to solving the wave 
equation. Solutions presented are produced by a single linear solve of the system in Eqs. (|S|)-(Q. All codes presented 
use PETSc [i^, [2^, [2^; the linear solve residuals given (labeled "Final Residuals") are the absolute residual norms, 

r=\\Ax-h\\L, (7) 

for the linear system Ax = b where A is the system matrix, x is the solution, and b is the system right hand side vector 
for both u and v. We use the L^o norm for reporting differences between the analytic and approximate solution: 

||e|U„ =max|e^ (8) 

for vector e. For Krylov solve examples, the initial guess given for the solution is always zero. 
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TABLE I: Convergence using LU decomposition of a space-time element simulation with solution given by Eq. @. There 
are {ux — l)(^t — 1) total elements in the mesh. The number of nodes in the x and t directions are Ux and nt, respectively. 
Using linear elements, we expect second order convergence in the Loo norm. The convergence rates reported are given by 
rate = \n{E2/Ei) /\n{h2/hi) where hi,h2, Ei, E2 are the successive quadrilateral lengths and Loo norms, respectively. 
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TABLE IL Various linear solver tests to solve the 1 + 1 problem on a 60^ structured mesh. GMRES performed the best, but 
required a very large number of iterations. LSQR is similar to a direct method and cannot be preconditioned in PETSc. Jacobi 
and Block- Jacobi preconditioning made GMRES convergence even worse than unpreconditioned. 



A. 1 + 1 Dimensions 



For 1 + 1 dimensions, we consider the nonhomogeneous wave equation with solution 



on a domain of : 



Ue (x, t) = exp — {x — cos t) 
-5,5] and t = [0, 10]. We choose the appropriate source term, /, in Eq. Q 



/ 



-2 (cost) e' 



-{x—cos t)^ 



(2 cos^ t — 4:X cos^ t + 2 COS t — 2 cos t 



(9) 



(10) 



and initial conditions to produce this test problem solution. Solving this system via LU decomposition with linear 
rectangular elements we observe the expected second order convergence, shown in Table U Since scaling with problem 
size using LU decomposition for a banded matrix is O (A^6^) - where N is the size of the matrix and b is the bandwidth 
- LU is entirely inadequate for large problems with space- time elements. Krylov solvers (2^], like GMRES are 
much more suitable for such problems. 

We tested a variety of solvers and preconditioners available in PETSc [2^, [2^ for the problem using a 60^ 
structured mesh. The results are summarized in Table HH While GMRES converges without preconditioning, it 
requires a high number of iterations to obtain a physically meaningful result. Preconditioning with Jacobi or Block- 
Jacobi does not improve the convergence rate. However, neither Jacobi nor Block- Jacobi preconditioning offer much 
flexibility with respect to the geometry of the problem. Additive Schwarz offers more flexibility in preconditioning 
this hyperbolic problem. 

We follow a time decomposition strategy for additive Schwarz, as illustrated in Figure ^ The domain of the problem 
is split into separate subdomains of time slabs. Each subdomain overlaps the face cells of its neighbors. The much 
smaller linear system of each subdomain is subsequently solved, either by GMRES or by LU decomposition, and 
the result used for preconditioning the global system. We respect the original boundary conditions for the subspace 
interface condition: Dirichlet for t = {tn-i — overlap) and evolution equation determined for t = {tn -\- overlap) where 
tn is the n-th time decomposition. 

Results for time decomposition of a 60^ mesh are summarized in Table 11111 Figure 13 shows plots of the solution after 
1, 10, 100, and 500 GMRES iterations for the twelve subdomain additive Schwarz case. Subdomains were defined for 
this case by equally dividing up the global domain into time slabs consisting of 5 or 6 nodes each in the time direction. 

The additive Schwarz preconditioner gives excellent performance compared to GMRES alone and provides a scalable 
alternative to LU decomposition for large problems. Furthermore, the additive Schwarz preconditioner is already 
suitable for time-parallel computation; each processor would take a portion of the time subdomain in preconditioning. 
Spatial domain decomposition could also be explored in connection with time decomposition; however, we restrict our 
attention to time decomposition here. 
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TABLE III: GMRES results using additive Schwarz method (ASM) preconditioning with a time decomposition strategy for 
the 1 + 1 dimension case using a 60^ structured mesh. The column labeled "Final Residual" gives the absolute residual norm 
for the linear solve. All the results show significant improvement over the comparable unpreconditioned GMRES case shown 
in Table UTI Increasing the number of time subdomains generally requires more GMRES iterations to achieve comparable error 
residuals; however, the preconditioner is potentially faster with more subdomains. Also, the preconditioner would be more 
scalable in parallel when using more subdomains. 




FIG. 1: The 60^ mesh used for the 1 + 1 dimension simulations. Here the entire mesh is divided into five subdomains in time 
for use in preconditioning. The linear systems resulting from each subdomain are solved and the result used for preconditioning 
the global linear system. The subspace interface condition is the same as for the original boundary conditions: Dirichlet for 
t = (tn-i — overlap) and evolution equation determined for t = (tn + overlap) where tn is the n-th time decomposition. This 
preconditioner is also time-parallel: each time subdomain could be solved simultaneously on a different processor. Spatial 
domain decomposition could also be applied, but we only examine time decomposition here. 



B. 2 + 1 Dimensions 



For 2 + 1 dimensions, we modify the solution to be 

Ue {x^y^t) = exp — (x — cos t)^ — (?/ + sint)^ . (11) 

on a domain of x^y = [—4,4] and t = [0,4]. The linear system is constructed using linear hexahedral elements giving 
second order convergence for the system. 

A time decomposition strategy for preconditioning is also explored in 2 + 1. Like the 1 + 1 case, employing a time 
decomposition strategy with additive Schwarz preconditioning significantly improves performance when compared to 
using GMRES alone or LU decomposition. Table HVl gives a summary of results obtained using a 40 x 40 x 20 mesh. 
Performance times given are the solve times obtained on AMD opteron 250 processor with a clock speed of 2.4 GHz 
using the PETSc timing utility. 

GMRES without preconditioning is ineffective for this problem due to the slow convergence rate. As expected, LU 
decomposition is also ineffective due to poor scaling as the problem size grows. In contrast, GMRES with additive 
Schwarz method (ASM) preconditioning using a time decomposition strategy is significantly more effective. Figure |31 
shows plots of the solution after 10, 50, 100, and 500 GMRES iterations for the five subdomain ASM preconditioned 
case. As in the 1 + 1 cases, the ASM preconditioner is time-parallel: parallelization can be achieved by simultaneously 
solving each time subdomain on a different processor. 
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After 1 iteration 



After 10 iterations 




After 100 iterations 



After 500 iterations 




FIG. 2: An additive Schwarz preconditioned example using twelve subdomains in time on a 60^ structured mesh, referenced in 
Table llTTI The plots show the solution after 1,10,100, and 500 GMRES iterations. Unlike time marching methods, the solution 
is constructed at all times simultaneously. The preconditioner substantially speeds up this process; evidence of the twelve 
additive Schwarz time subdomains is apparent after the first iteration of GMRES. 



C. 3 + 1 Dimensions 



For 3 + 1 dimensions, we select the solution to be 



Ue {x,y,z,t) = exp 



cost 



sin t 



■ cost 



(12) 



on a domain of x^y^z = [—2.5,2.5] and t = [0,5]. The linear system is constructed using linear tesseractic ele- 
ments consisting of 16 nodes per element, giving second order convergence for the system. Tesseracts are the higher 
dimensional analogue of hexahedra 29]. Table IVl gives a summary of results obtained using a 18^ mesh. 

Figures show plots of the solution at selected time slices; Figure shows the nine subdomain additive Schwarz 
preconditioned case at 10, 50, 100, and 750 GMRES iterations. 



III. CONCLUSIONS 



We have numerically examined space-time finite elements for the nonhomogeneous wave equation, testing several 
types of linear solvers and preconditioner s in several dimensions. The motivation of this study is to explore the per- 
formance issues surrounding the use of space-time elements in the context of numerical relativity. Fully unstructured 
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TABLE IV: Linear solve results for the 2+1 dimension case using a 40 x 40 x 20 structured mesh. The column labeled "Final 
Residual" gives the absolute residual norm for the linear solve. The additive Schwarz preconditioned cases show significant 
performance gain over the comparable unpreconditioned GMRES case and LU case. The performance times, given in seconds, 
were obtained using the PETSc timing utility running on an AMD opteron 250 processor. The GMRES simulations required 
significantly less memory than LU decomposition. Figure El shows the solution for the five subdomain ASM case after 10, 50, 
100, and 500 GMRES iterations. 



After 10 iterations After 50 iterations 




After 100 iterations After 500 iterations 




FIG. 3: An additive Schwarz preconditioned example in 2 + 1 dimensions using five subdomains in time on a 40 x 40 x 20 
structured mesh, referenced in Table llVl The plots show the solution after 10,50,100, and 500 GMRES iterations. The isosurface 
indicates a surface with value of 0.8, tracking the motion of the pulse in time. A slice of the solution at time 3 is also shown 
with contour lines on the slice. The vertical axis is the time direction. Like Figure El in the 1 + 1 dimension case, the solution is 
constructed at all timesteps at once rather than sequentially solving a single timestep at a time as in time marching methods 




FIG. 4: Initial data for 3+1 simulation in Figure El 
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100 iterations: 



750 iterations: 




FIG. 5: The additive Schwarz preconditioned 3+1 example using nine subdomains in time on a 18^ structured mesh referred to 
in Table Ivl The initial data for this simulation is shown in Figure 0] The plots here show the solution, by row, after 10,50,100, 
and 750 GMRES iterations. The columns indicate the time of the solution: the left column shows the solution at time 1, the 
middle column shows time 3, and the right column shows time 5. An isosurface with value 0.8 tracks the motion of the pulse. 
Two spatial planes with contours are also shown: the first on the x = — 1 plane, and the second on the z = —1 plane. Like 
Figures 13 and El the solution is constructed at all times simultaneously. The mesh consists of tesseracts, the higher dimensional 
analogue of hexahedra. The 3-D datasets shown are time slices from the 4-D mesh. 



meshes in space and time can greatly simplify issues surrounding time- varying computational domains and space-time 
mesh refinement, provided that both the domain and refinement are specified a priori. They have also shown promise 
when the time- varying domain is not known a priori, as in 30] and 31]. We restricted our attention to those sim- 
ulations which could be performed on a single processor. Fully implicit examples using a continuous time Galerkin 
method were presented in 1 + 1, 2 + 1, and 3 + 1 dimensions using linear quadrilateral, hexahedral, and tesseractic 
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TABLE V: Linear solve results for the 3+1 dimension case using an 18^^ structured mesh. The column labeled "Final Residual" 
gives the absolute residual norm for the linear solve. Unlike the 1 + 1 and 2 + 1 cases, these 3+1 examples were only possible 
on a single processor because of the additive Schwarz preconditioning; both LU decomposition and unpreconditioned GMRES 
were impractical because of memory limitations or time limitations. These 3 + 1 simulations used a mesh composed of linear 
tesseracts: 16 node 4-D hyperelements. Figure 0] shows the initial data used for these simulations; Figure |5] shows the solution 
for the nine subdomain ASM case after 10, 50, 100, and 750 GMRES iterations at three times: 1,3, and 5. 



elements. 

We found that LU decomposition and unpreconditioned GMRES were both capable of solving the linear systems 
which appear in these space-time element simulations. However, both choices scaled too poorly with respect to 
problem size to be effective even for moderate size simulations in 3 + 1. Standard preconditioners like Jacobi and 
Block-Jacobi did not improve GMRES performance for the space-time linear systems. 

We found that additive Schwarz preconditioning significantly improved GMRES performance. Substantial perfor- 
mance improvements were observed by applying a time decomposition strategy in additive Schwarz preconditioning. 
The time decomposition strategy consisted of decomposing the global mesh into several smaller time subdomains 
for use in preconditioning. This preconditioning strategy is also time-parallel: all the time subdomains used in 
preconditioning can be solved simultaneously on separate processors. 

Several improvements upon the additive Schwarz preconditioner remain to be explored. In the experiments presented 
here, only face cell overlap was examined. Also, no attempt was made to combine time decomposition with spatial 
domain decomposition even though such a combination would be natural. A study of the optimal interface condition 
32] is another interesting question since the interface condition explored here was physically motivated. Attempts 
at a parallel implementation of the preconditioner will be forthcoming. The substantial performance benefits of the 
ASM preconditioner make further study into space-time elements for numerical relativity feasible. 
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